Matrices, Vector Spaces, and Information Retrieval
نویسندگان
چکیده
The evolution of digital libraries and the Internet has dramatically transformed the processing, storage, and retrieval of information. Efforts to digitize text, images, video, and audio now consume a substantial portion of both academic and industrial activity. Even when there is no shortage of textual materials on a particular topic, procedures for indexing or extracting the knowledge or conceptual information contained in them can be lacking. Recently developed information retrieval technologies are based on the concept of a vector space. Data are modeled as a matrix, and a user’s query of the database is represented as a vector. Relevant documents in the database are then identified via simple vector operations. Orthogonal factorizations of the matrix provide mechanisms for handling uncertainty in the database itself. The purpose of this paper is to show how such fundamental mathematical concepts from linear algebra can be used to manage and index large text collections.
منابع مشابه
Vectors, Planes and Context
Information Retrieval (IR) models based on vector spaces have been investigated for a long time. Nevertheless, they have recently attracted further research interest beyond the classical statistical view of vectors and matrices. Moreover, “context” has been recognized as a crucial component of IR systems. As the way context affects IR systems is very complex, a principled approach to modeling a...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملExploring the relationship between feature and perceptual visual spaces
visual information (images or videos) is increasing and thereby demanding appropriate ways to represent and search these information spaces. Their visualization often relies on reducing the dimensions of the information space to create a lower-dimensional feature space which, from the point-of-view of the end user, will be viewed and interpreted as a perceptual space. Critically for information...
متن کاملs-Topological vector spaces
In this paper, we have dened and studied a generalized form of topological vector spaces called s-topological vector spaces. s-topological vector spaces are dened by using semi-open sets and semi-continuity in the sense of Levine. Along with other results, it is proved that every s-topological vector space is generalized homogeneous space. Every open subspace of an s-topological vector space is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- SIAM Review
دوره 41 شماره
صفحات -
تاریخ انتشار 1999